Proactive Learning for Building Machine Translation Systems for Minority Languages

نویسندگان

  • Vamshi Ambati
  • Jaime G. Carbonnell
چکیده

Building machine translation (MT) for many minority languages in the world is a serious challenge. For many minor languages there is little machine readable text, few knowledgeable linguists, and little money available for MT development. For these reasons, it becomes very important for an MT system to make best use of its resources, both labeled and unlabeled, in building a quality system. In this paper we argue that traditional active learning setup may not be the right fit for seeking annotations required for building a Syntax Based MT system for minority languages. We posit that a relatively new variant of active learning, Proactive Learning, is more suitable for this task.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Linguistic Structure and Bilingual Informants Help Induce Machine Translation of Lesser-Resourced Languages

Producing machine translation (MT) for the many minority languages in the world is a serious challenge. Minority languages typically have few resources for building MT systems. For many minor languages there is little machine readable text, few knowledgeable linguists, and little money available for MT development. For these reasons, our research programs on minority language MT have focused on...

متن کامل

The Correlation of Machine Translation Evaluation Metrics with Human Judgement on Persian Language

Machine Translation Evaluation Metrics (MTEMs) are the central core of Machine Translation (MT) engines as they are developed based on frequent evaluation. Although MTEMs are widespread today, their validity and quality for many languages is still under question. The aim of this research study was to examine the validity and assess the quality of MTEMs from Lexical Similarity set on machine tra...

متن کامل

Building Language Resources and Translation Models for Machine Translation Focused on South Slavic and Balkan Languages

The aim of this short-term project was to investigate the feasibility of machine translation (MT) research and development for several South Slavic and Balkan languages, more precisely Romanian, Bulgarian, Slovene, Greek and Serbian. For these languages, MT systems are scarce and for some of them even non-existent. We provide a brief description of the project’s major research tasks: Compilatio...

متن کامل

EuskoParl: a speech and text Spanish-Basque parallel corpus

The advances in corpus-based approaches and machine learning techniques have promoted the development of minority languages. The contribution of this work is to acquire a parallel corpus in Spanish and Basque with both text and speech data. In order to be able to compare the systems with those developed for other languages, Europarl corpus was taken as a reference in both domain and size. The a...

متن کامل

Rule-based Breton to French machine translation

This paper describes a rule-based machine translation system from Breton to French intended for producing gisting translations. The paper presents a summary of the ongoing development of the system, along with an evaluation of two versions, and some reflection on the use of MT systems for lesser-resourced or minority languages.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009